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We thoroughly enjoyed reading this excellent au- 
thoritative paper full of interesting ideas, which 
should be useful in both Bayesian and non-Bayesian 
inferences. We first discuss the accuracy of the ADM 
approximation to a Bayesian solution in a real-life 
application and then discuss how some of the ideas 
presented in the paper could be useful in a non- 
Bayesian setting. 

HOW DOES THE ADM WORK IN A REAL 
APPLICATION? 

Although the main objective of this paper is to 
make inferences on the high-dimensional parameters 
or the random effects 9i, the authors note that the 
success of the Bayesian method lies on the accurate 
estimation of the shrinkage parameters Bi since they 
appear linearly in the expressions for the posterior 
mean and posterior variance of 9i when the hyperpa- 
rameters are known. Thus, we assess the accuracy of 
the ADM approximation, given in Section 2.8, rela- 
tive to the standard first-order Laplace approxima- 
tion, in approximating the posterior distribution of 
the shrinkage factors for the hierarchical model (1)- 
(2). This model, commonly referred to as the Fay- 
Herriot model in the small area literature, was used 
by Fay and Her riot (1979) in order to combine sur- 
vey data and different administrative records in pro- 
ducing empirical Bayes estimates of per-capita in- 
come of small places. Since then the Fay-Herriot 
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model and its variants have been used in various 
federal programs such as the Small Area Income 
and Poverty Estimates (SAIPE) and the Small Area 
Health Insurance estimates (SAHIE) programs of 
the U.S. Census Bureau. 

For purposes of evaluation, we consider the prob- 
lem of estimating the proportion of 5- to 17-year- 
old (related) children in poverty for the fifty states 
and the District of Columbia using the same data 
set considered by Bell (1999). We choose two years 
(1993 and 1997) of state-level data from the SAIPE 
program. In 1993, the REML estimate of A is posi- 
tive while in year 1997 it is zero. The choice of these 
two years will thus give us an opportunity to assess 
the accuracy of the ADM approximations in two dif- 
ferent scenarios. 

We assume the standard SAIPE state-level model 
in which survey-weighted estimates of the propor- 
tions follow the two-level model given by (l)-(2). 
The survey-weighted proportions are obtained us- 
ing the Current Population Survey (CPS) data with 
their variances Vi estimated by a Generalized Vari- 
ance Function (GVF) method, following Otto and 
Bell (1995), but assumed to be known throughout 
the estimation procedure. We use the same state- 
level auxiliary variables Xi (a vector of length 5, 
i.e., r = 5), obtained from Internal Revenue Service 
(IRS) data, food stamp data and Census data that 
the SAIPE program used for the problem. We as- 
sume the uniform prior on f3 and superharmonic 
(uniform) prior on A, as used in the Morris-Tang 
paper. 

For the presentation of our results, we consider 
a selection of four states — California (CA), North 
Carolina (NC), Indiana (IN) and Mississippi (MS) — 
considered by Bell (1999). This selection represents 
both small (i.e., large Vi) and large (i.e., small Vi) 
states and thus should give us a fairly general idea of 
the degree of accuracy of the Laplace and ADM ap- 
proximations with varying Vi when compared to the 
exact posterior distributions of the shrinkage factors 
obtained by one-dimensional numerical integration. 
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Fig. 1. Plot of exact posterior density of Bi along with approximate densities using SAIPE 1993 state-level data. The four 
states in the plot are taken from Bell (1999). 



First, consider the year 1993 when the REML esti- 
mate of A is positive (1.7). The exact posterior dis- 
tributions of the shrinkage factors, ADM approxi- 
mations and the first-order Laplace approximations 
(Kass and Steffey, 1989) are plotted in Figure 1. The 
solid curves in Figures 1 and 2 are the exact poste- 
rior distributions of Bi , which are obtained from the 
posterior of A, under the prior, after multiplying by 
the Jacobian and normalizing through numerical one- 
dimensional integration. The dotted lines are first- 



order Laplace approximations to the posterior dis- 
tributions of Bi, which are simply normal distribu- 
tions with means identical to Bi with A replaced by 
its posterior mode and variance expressions given in 
Kass and Steffy (1989). Thus, the posterior means 
and variances of Bi are essentially approximated by 
the first-order Laplace method. From the plot it is 
clear that the ADM approximation is far better than 
the first-order Laplace approximation when we com- 
pare them with the exact posterior distribution of Bi. 
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Fig. 2. Plot of exact posterior densities of the shrinkage factors Bi along with approximate densities using SAIPE 1997 
data; the four states in the plot are taken from Bell (1999). 



DISCUSSION 
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Table 1 

Comparison of posterior moments based on SAIPE 
state-level data 



Posterior mean Posterior variance 



State 


Exact 


ADM 


Laplace' 


Exact 


ADM 


Laplace* 






Results based on 


1993 data 




CA 


0.47 


0.37 


0.56 


0.038 


0.023 


0.093 


NC 


0.62 


0.55 


0.72 


0.030 


0.025 


0.061 


IN 


0.80 


0.77 


0.87 


0.014 


0.014 


0.019 


MS 


0.81 


0.79 


0.89 


0.012 


0.012 


0.015 






Results based on 


1997 data 




CA 


0.68 


0.60 


1.00 


0.037 


0.041 


0.987 


NC 


0.84 


0.81 


1.00 


0.014 


0.018 


0.120 


IN 


0.87 


0.85 


1.00 


0.010 


0.013 


0.071 


MS 


0.92 


0.91 


1.00 


0.005 


0.005 


0.021 



*Laplace first-order approximation; see Kass and Steffey 
(1989) for details. 



Table 1 displays the exact posterior means and 
variances as well as their approximations for these 
states. In general, the ADM approximation appears 
to be fairly accurate with an indication of under- 
approximation of the posterior mean, especially for 
states with small Vi (CA, NC). On the other hand, 
the first-order Laplace approximation generally over- 
estimates the exact posterior means, sometimes by 
a large margin, and approximates shrinkage factors 
for all the states in the year 1997 by 1. Turning to the 
posterior variances, we observe that the first-order 
Laplace method generally over-approximates the ex- 
act posterior variances, sometimes by a large mar- 
gin, especially for the year 1997. The poor perfor- 
mance of the Laplace method, for the SAIPE 1997 
data, can be attributed to the fact that the use of 
uniform prior on A yields a posterior mode that lies 
on the boundary. The ADM approximation appears 
to perform well for both the years, especially for 
1997 when the Laplace method breaks down. For 
the year 1993, the ADM method appears to slightly 
under- approximate the exact posterior variances, es- 
pecially for the states with small Vi. Overall, it ap- 
pears that the accuracy of the ADM approximation 
depends somewhat on the states — the larger the Vi 
the better the approximation accuracy. 

We expect the accuracy of the Laplace approxi- 
mation to depend on the specific prior used for A. 
In addition, the quality of both first- and second- 
order Laplace approximations seems to depend on k 
and the Vi/A values. For our SAIPE data analyses, 
we also tried second-order Laplace approximations 



for both the years (not reported here). The second- 
order Laplace approximation generally improves on 
the accuracy for states with large Vi when the pos- 
terior mode is strictly positive. However, when the 
posterior mode is on the boundary (e.g., for the 
year 1997), the Laplace second-order approximation 
produces undesirable results, such as i?j > 1, nega- 
tive posterior variance, etc. So we could not even 
produce the graphs. For asymptotic expansions of 
posterior expectations when the posterior mode is 
on the boundary, one might need to consider ap- 
proaches outlined in Erkanli (1994); this can be tried 
in the future. But even then we believe that for 
small k the Laplace method may not perform well 
in presence of extreme skewness. 

One important step in approximating the poste- 
rior distributions used by Morris and Tang involves 
finding the ALM (Adjustment for Likelihood Maxi- 
mization) estimator of A by maximizing the product 
of the REML likelihood L[A) and a universal adjust- 
ment factor A applicable to all the states primarily 
to avoid a zero estimate of A. Given the above data 
analyses, is there any need to find different adjust- 
ment factors, possibly depending on the Vi, when 
approximating the posterior of Bil 

HOW MAY THE ADM METHOD BE USEFUL 
IN A NON-BAYESIAN PARADIGM? 

While the method proposed in the paper under 
discussion is essentially Bayesian with an innovative 
simple way to approximate the exact Bayesian so- 
lution, one could use some of the ideas presented 
in the paper in non-Bayesian approaches like the 
empirical best linear unbiased prediction (EBLUP) 
widely used in small area estimation. To elaborate 
on this point, first note that the two-level model, 
given by (l)-(2), can be viewed as the following sim- 
ple linear mixed model: 

Ui = Xif3 + Ui + ei, 

where {ui}, area-specific random effects, and {cj}, 
sampling errors, are independently distributed with 
Ui'^N{0,A) and a N{0,V),i = I, ■ ■ ■ ,k. 

The Bayes estimator of 9i = x[(3 + Ui, as approxi- 
mated by the ADM method, is identical to an EBLUP 
of 9i when the ALM estimator of A is used in place 
of REML, ML or other standard variance compo- 
nent estimators. The results on the frequentist cov- 
erage (i.e., conditional on the hyperparameters (3 
and A) of the approximate Bayesian intervals of 9i 
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presented in the Morris-Tang paper should be en- 
couraging to the non-Bayesians. However, from a fre- 
quentist perspective, the interesting problem of es- 
tablishing the second-order accuracy of coverage 
along the lines of Li and Lahiri (2010) remains open. 
Morris and Tang suggested an approximation to the 
posterior variance of 9i as a measure of uncertainty 
of their point estimate. However, since their point 
estimate of 9i can be viewed as an EBLUP, one may 
suggest the Morris-Tang measure of uncertainty to 
estimate the traditional mean squared error (same 
as the integrated Bayes risk, conditional on the hy- 
perparameters) as described in Jiang and Lahiri 
(2006). It is not, however, clear if the usual second- 
order unbiasedness criterion, advocated by the non- 
Bayesians, would be satisfied by the approximate 
posterior variance formula given in (58) of the Mor- 
ris-Tang paper. We refer the readers to Rao (2003) 
and Jiang and Lahiri (2006) for a review of the non- 
Bayesian methods. 

The standard variance component estimation me- 
thods such as the REML and ML, despite their 
good asymptotic properties, frequently yield zero es- 
timates of the unknown variance component A. This 
is a lingering problem in the classical variance com- 
ponent literature. For the model (l)-(2), the simu- 
lation results given in Li and Lahiri (2010) suggest 
that the percentage of zero estimates by the REML 
method depends on several factors, including the 
variation of the ratios Vi/A across the small areas 
and the value of k. Li and Lahiri (2010) obtained 
an adjusted maximum likelihood (AML) by multi- 
plying the profile likelihood, as given by Lp{A) in 
Section 2 of Li and Lahiri (2010), by an adjustment 



factor A. This translates to the following adjustment 
factor: 

h{A) = A\X'Dyl^X\^/^ 

for the corresponding residual likelihood, given in 
Section 2 of Li and Lahiri (2010). Note that h{A) 
differs from the adjustment factor A suggested in 
the paper under discussion by an additional factor 

\x'Dyi^x\y^. 

In the context of estimating the shrinkage fac- 
tors Bi, simulation results of Li and Lahiri (2010) in- 
dicate lower biases of the shrinkage estimators when 
the Li-Lahiri adjustment factor is used, where the 
bias is defined with respect to the marginal distri- 
bution of y, given the hyperparameters /3 and A. In 
the context of a general linear mixed model, Lahiri 
and Li (2009) proposed a generalized maximum like- 
hhood (GML) method, which includes ML, REML, 
ALM and AML methods as special cases. For the 
model (l)-(2), the GML essentially maximizes 
h{A) X L{A) with respect to A, where h{A) is a gen- 
eral adjustment factor. This raises an interesting 
question: how should one choose an adjustment fac- 
tor h{A) in the GML method? 

To fix ideas, we restrict ourselves to the class of ad- 
justment factors of the form h{A) = A"^. Since Y{Bi) 
is not affected by q, up to the order o{k~^) (Lahiri 
and Li, 2009), it makes sense to choose q that pro- 
vides good properties in terms of the bias of the 
estimator. To this end, using Lahiri and Li (2009), 
we have 

Bias(^,) ^ 1 / q_\ 




Fig. 3. Plot of simulated biases of different GML estimators of Bi: the balanced case. 
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Obviously, q = 1 — Bi is the ideal choice — one that 
makes the bias/variance ratio nearly zero. While we 
cannot use this choice since A is unknown, it sug- 
gests the range [0, 1] for q. Interestingly, the REML 
corresponds to the choice q = while the Morris- 
Tang ALM corresponds to the other extreme (7=1. 
A compromise choice is q = 0.5, which corresponds 
to Bi = 0.5. In the Bayesian language, this choice 
would then correspond to the prior 7r{A) = 1/y/A, 
a prior also mentioned in the paper, since the Morris- 
Tang ADM recommends the adjustment factor A for 
any prior on A. Figure 3 displays the simulated bi- 
ases of different estimators of the shrinkage factor 
for the balanced version of model (l)-(2). In terms 
of the bias, the multiplier ^/A usually works better 
than A. 

Let us now explain how the ALM or AML method 
may help a non-Bayesian method like the parametric 
bootstrap (Chatterjee, Lahiri and Li, 2008; Li and 
Lahiri, 2010) in constructing intervals for the ran- 
dom effects 9i, which requires repeated generation 
of a pivotal quantity from several bootstrap sam- 
ples. A strictly positive estimate of A is absolutely 
needed for this method since the pivotal quantity is 
undefined when A estimate is zero. A crude fix is 
to take a small positive number whenever the esti- 
mate of A is zero. But, in a simulation study, Li and 
Lahiri (2010) observed that the coverage errors and 
also the length of the parametric bootstrap method 
could be sensitive to the choice of this positive trun- 
cation point. The ALM or AML offers a sensible so- 
lution to this problem of the parametric bootstrap 
method. Li and Lahiri (2010) showed that the use of 
ALM or AML estimator of A improves on coverage 
as well as the length of the parametric bootstrap 
interval estimate. 

In the paper under discussion, Morris and Tang 
discuss the case of a single variance parameter A. 
Pramanik (2008) extended the ADM method to the 
nested error regression model with two unknown 
variance components by noting that one of the varian- 
ce components that corresponds to the within small 
area variation can be integrated out. However, it is 
not clear how the ADM method, as proposed by 
Morris and Tang, would extend to a general linear 
mixed model with more than two variance compo- 
nents, a situation where a simple method such as 
the ADM method would be most welcome. 

We congratulate the authors for preparing an in- 
sightful and informative paper on the ADM method. 
This will surely inspire others to contribute to this 
important area of research. 
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